#### Architecture Level Power Reduction Method for Configurable Processor Generation

Hirofumi Iwato, Keishi Sakanushi Yoshinori Takeuchi, and Masaharu Imai

Graduate School of IST Osaka University, Japan

2008/06/24

©Integrated System Design Lab.

## Outline

- □ Introduction
- □ Clock Gating
- VLIW Processor Generation Flow
- Extracting Non-Redundant Activation Conditions (NRAC)
- Experimental Results
- Conclusion

1

# Background



## **VLIW Processors**

| VLIW Processors              | <ul> <li>Advantage</li> <li>High Performance due to<br/>Instruction-set Level Parallelism<br/>(ILP)</li> <li>Less Power Consumption than</li> </ul> |
|------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
|                              | Super Scalar Processor  Issues to be Solved                                                                                                         |
| Constraints<br>Area<br>Delay | <ul> <li>Better Design Space Exploration<br/>Methodology for Low Power VLIW<br/>Processors</li> </ul>                                               |
| Power                        | <ul> <li>Better Power Reduction Method for<br/>Configurable VLIW Processors</li> </ul>                                                              |

## Outline

- Introduction
- Clock Gating
- VLIW Processor Generation Flow
- Extracting Non-Redundant Activation Conditions (NRAC)
- Experimental Results
- Conclusion

| 2008/06/24 | ©Integrated System Design Lab. |
|------------|--------------------------------|
|            |                                |

# **Clock Gating**



5

## Related Work (1)

 Power Compiler, Synopsys Inc.
 Clock gating insertion tool from RTL descriptions
 Power Compiler does not modify gating signals
 Monteiro, J.C., et al.: "Implicit FSM Decomposition applied to Low-Power Design," IEEE Trans. on VLSI Systems, Vol. 10, Issue 5, pp. 560-565, Oct. 2002.
 Improves Gating Signals by Reforming FSM

Controller based on FSM is NOT Suitable for Pipeline Processors

2008/06/24

©Integrated System Design Lab.

# Related Work (2)

- Babighian, P., et al.: "A Scalable Algorithm for RTL Insertion of Gated Clocks based on ODCs Computation," IEEE Trans. on CAD of Integrated Circuits and Systems, Vol. 24, Issue 1, pp. 29-42, Jan. 2005
  - Improves Gating Signals by Calculating Observability Don't Care Conditions
  - Causes Enormous Amount of Area Overhead

7

# <section-header><list-item><list-item><list-item><list-item><list-item><list-item><list-item><list-item><list-item><table-row><table-row>

- □ Introduction
- Clock Gating
- □ VLIW Processor Generation Flow
- Extracting Non-Redundant Activation Conditions (NRAC)
- Experimental Results
- Conclusion

# **VLIW Processor Generation Flow**



## Outline

- □ Introduction
- Clock Gating
- VLIW Processor Generation Flow
- Extracting Non-Redundant Activation Conditions (NRAC)
- Experimental Results
- Conclusion

## **NRAC** Extraction

- RCG (Resource Connection Graph) Extraction
- Merging RCG
- Signal Conflict Resolution
- Pipelining

2008/06/24

©Integrated System Design Lab.

13

# **RCG** Extraction



# Merging RCG



#### New Conditions of Merged Data Transfer

The Data Transfer Conditions are Unified



#### NRAC Extraction from Data Transfer Condition



#### NRAC for Pipeline Register Activation (1)



## NRAC for Pipeline Register Activation (2)



## Outline

- Introduction
- Clock Gating
- VLIW Processor Generation Flow
- Extracting Non-Redundant Activation Conditions (NRAC)
- Experimental Results
- Conclusion

## **Total Power Reduction**



## **Total Power Reduction**



#### Area vs. the Number of Parallel Issue Slot

| Processor       | Parallel Issue # |         |         |  |
|-----------------|------------------|---------|---------|--|
| FIOCESSOI       | 2                | 3       | 4       |  |
| Non-clock-gated | 70,280           | 113,083 | 168,929 |  |
| Power Compiler  | 63,589           | 101,890 | 152,517 |  |
| Proposed Method | 63,857           | 102,346 | 153,366 |  |
| Overhead(gates) | 269              | 456     | 848     |  |
| Overhead(%)     | 0.42%            | 0.45%   | 0.56%   |  |

(Unit : Gates)

Area Overhead is Negligible

2008/06/24

©Integrated System Design Lab.

23

#### Area Reduction by Clock Gating



## Outline

- Introduction
- □ Clock Gating
- VLIW Processor Generation Flow
- Extracting Non-Redundant Activation Conditions (NRAC)
- Experimental Results
- Conclusion

©Integrated System Design Lab.

25

## Conclusion

- A Low Power VLIW Processor Generation Method has been Proposed
- Experimental Results Show
  - Efficient Power Reduction
    - ~60% Less Power than Non-Clock-Gating
    - ~35% Less Power than Power Compiler
  - Power Reduction of Pipeline Register is Dominant
     ~70% Less Power than Non-Clock-Gating
    - □ ~60% Less Power than Power Compiler
  - Area Overhead is Negligible
     0.5% More Area than Power Compiler